Code
pacman::p_load(tmap, sf, tidyverse, sfdep, knitr, Hmisc, mapview, DT)Muhamad Ameer Noor
December 2, 2023
December 3, 2023

As cities upgrade to digital systems, like smart buses and GPS, a lot of data is created about how people move around. This data is super useful for understanding patterns in both space (where) and time (when). For example, on public buses, smart cards and GPS devices collect information about routes and how many people are riding. This data holds clues about how people behave in the city, and understanding these patterns can help manage the city better and give an edge to transportation providers.
But, often, this valuable data is only used for basic things like tracking and mapping using Geographic Information System (GIS) applications. This is because regular GIS tools struggle to fully analyze and model this kind of complex spatial and time-related data.
In this exercise, we’re going to focus on bus trips during specific peak hours:
| Peak hour period | Bus tap on time |
|---|---|
| Weekday morning peak | 6am to 9am |
| Weekday afternoon peak | 5pm to 8pm |
| Weekend/holiday morning peak | 11am to 2pm |
| Weekend/holiday evening peak | 4pm to 7pm |
To better understand these movement patterns and behaviors, we’re going to use something called Exploratory Spatial Data Analysis (ESDA). It’s like a powerful tool that helps us tackle complex challenges in society. In this project, we’re going to use specific methods like Local Indicators of Spatial Association (LISA) that will help us dig deep into the data and uncover valuable insights.
Local Indicators of Spatial Association (LISA) statistics serve two primary purposes. Firstly, they act as indicators of local pockets of nonstationarity, or hotspots. Secondly, they are used to assess the influence of individual locations on the magnitude of a global statistic. In simpler terms, LISA helps in identifying specific areas (like hotspots) that differ significantly from their surrounding areas and evaluates how these individual areas contribute to the overall pattern observed in a larger region.
the content of the following panel explained what aspatial and geospatial data are used in this project.
Monthly Passenger Volume by Origin Destination Bus Stops:
August, September and October 2023 Period
downloaded from LTA DataMall via API
csv format.
Columns/Fields in the dataset includes YEAR_MONTH, DAY_TYPE, TIME_PER_HOUR, PT_TYPE, ORIGIN_PT_CODE, DESTINATION_PT_CODE, and TOTAL_TRIPS.
Two geospatial data in shp format are used in this project, which includes:
Bus Stop Location from LTA DataMall. It provides information about all the bus stops currently being serviced by buses, including the bus stop code (identifier) and location coordinates.
hexagon, a hexagon layer of 500m (perpendicular distance between the centre of the hexagon and its edges.) Each spatial unit is regular in shape and finer than the Master Plan 2019 Planning Sub-zone GIS data set of URA.
Uniform Distances Everywhere: Think of hexagons as honeycomb cells. Each cell (hexagon) touches its neighbors at the same distance from its center. It’s like standing in the middle of a room and being the same distance from every wall, making it easier to measure and compare things.
Outlier-Free Shape: Hexagons are like well-rounded polygons without any pointy tips. Sharp corners can create odd spots in data, but hexagons smoothly cover space without sticking out anywhere. This helps prevent weird data spikes that don’t fit the pattern.
Consistent Spatial Relationships: Imagine a beehive where every hexagon is surrounded by others in the same pattern. This regular pattern is great for analyzing data because you can expect the same relationships everywhere, making the data predictable and easier to work with.
Ideal for Non-Perpendicular Features: Real-world features like rivers and roads twist and turn. Squares can be awkward for mapping these, but hexagons, which are more circular, can follow their flow better. This way, a hexagon-based map can mimic the real world more closely than a checkerboard of squares.
Summarized from: Dhuri, and Sekste & Kazakov.
Before starting with the analysis, we have to load the library (and install it if it does not exist in the environment yet), import the data, and process the data
Explanations for the imported library:
tmap for visualizing geospatial
sf for handling geospatial data
tidyverse for handling aspatial data
sfdep for computing spatial weights, global and local spatial autocorrelation statistics, and
knitr for creating html tables
Hmisc for summary statistics
mapview for interactive map backgrouds
DT library to create interactive html tables
the following code will import all the origin destination bus data and check a sample dataframe. The process involved:
import the csv file using read_csv function from readr package
using mutate from dplyr package, transform georeference data type into factors for easing compatibility issue and more efficient processing.
using describe from hmisc package, display the summary statistics of the dataset.
# Load each csv file
odb8 <- read_csv("../data/aspatial/origin_destination_bus_202308.csv.gz")
# change georeference data type into factors
odb8 <- odb8 %>%
mutate(
ORIGIN_PT_CODE = as.factor(ORIGIN_PT_CODE),
DESTINATION_PT_CODE = as.factor(DESTINATION_PT_CODE)
)
# check the dataframe
describe(odb8)odb8
7 Variables 5709512 Observations
--------------------------------------------------------------------------------
YEAR_MONTH
n missing distinct value
5709512 0 1 2023-08
Value 2023-08
Frequency 5709512
Proportion 1
--------------------------------------------------------------------------------
DAY_TYPE
n missing distinct
5709512 0 2
Value WEEKDAY WEEKENDS/HOLIDAY
Frequency 3279836 2429676
Proportion 0.574 0.426
--------------------------------------------------------------------------------
TIME_PER_HOUR
n missing distinct Info Mean Gmd .05 .10
5709512 0 24 0.997 14.07 5.949 6 7
.25 .50 .75 .90 .95
10 14 18 21 22
lowest : 0 1 2 3 4, highest: 19 20 21 22 23
--------------------------------------------------------------------------------
PT_TYPE
n missing distinct value
5709512 0 1 BUS
Value BUS
Frequency 5709512
Proportion 1
--------------------------------------------------------------------------------
ORIGIN_PT_CODE
n missing distinct
5709512 0 5067
lowest : 01012 01013 01019 01029 01039, highest: 99139 99161 99171 99181 99189
--------------------------------------------------------------------------------
DESTINATION_PT_CODE
n missing distinct
5709512 0 5071
lowest : 01012 01013 01019 01029 01039, highest: 99139 99161 99171 99181 99189
--------------------------------------------------------------------------------
TOTAL_TRIPS
n missing distinct Info Mean Gmd .05 .10
5709512 0 3544 0.982 21.04 33.59 1 1
.25 .50 .75 .90 .95
2 4 13 38 74
lowest : 1 2 3 4 5, highest: 30799 31669 32508 33424 35049
--------------------------------------------------------------------------------
# Load each csv file
odb9 <- read_csv("../data/aspatial/origin_destination_bus_202309.csv.gz")
# change georeference data type into factors
odb9 <- odb9 %>%
mutate(
ORIGIN_PT_CODE = as.factor(ORIGIN_PT_CODE),
DESTINATION_PT_CODE = as.factor(DESTINATION_PT_CODE)
)
# check the dataframe
describe(odb9)odb9
7 Variables 5714196 Observations
--------------------------------------------------------------------------------
YEAR_MONTH
n missing distinct value
5714196 0 1 2023-09
Value 2023-09
Frequency 5714196
Proportion 1
--------------------------------------------------------------------------------
DAY_TYPE
n missing distinct
5714196 0 2
Value WEEKDAY WEEKENDS/HOLIDAY
Frequency 3183300 2530896
Proportion 0.557 0.443
--------------------------------------------------------------------------------
TIME_PER_HOUR
n missing distinct Info Mean Gmd .05 .10
5714196 0 23 0.997 14.06 5.94 6 7
.25 .50 .75 .90 .95
10 14 18 21 22
lowest : 0 1 2 4 5, highest: 19 20 21 22 23
--------------------------------------------------------------------------------
PT_TYPE
n missing distinct value
5714196 0 1 BUS
Value BUS
Frequency 5714196
Proportion 1
--------------------------------------------------------------------------------
ORIGIN_PT_CODE
n missing distinct
5714196 0 5067
lowest : 01012 01013 01019 01029 01039, highest: 99139 99161 99171 99181 99189
--------------------------------------------------------------------------------
DESTINATION_PT_CODE
n missing distinct
5714196 0 5072
lowest : 01012 01013 01019 01029 01039, highest: 99139 99161 99171 99181 99189
--------------------------------------------------------------------------------
TOTAL_TRIPS
n missing distinct Info Mean Gmd .05 .10
5714196 0 3274 0.981 19.59 31.01 1 1
.25 .50 .75 .90 .95
2 4 12 35 69
lowest : 1 2 3 4 5, highest: 27356 28248 28510 30096 31674
--------------------------------------------------------------------------------
# Load each csv file
odb10 <- read_csv("../data/aspatial/origin_destination_bus_202310.csv.gz")
# change georeference data type into factors
odb10 <- odb10 %>%
mutate(
ORIGIN_PT_CODE = as.factor(ORIGIN_PT_CODE),
DESTINATION_PT_CODE = as.factor(DESTINATION_PT_CODE)
)
# check the dataframe
describe(odb10)odb10
7 Variables 5694297 Observations
--------------------------------------------------------------------------------
YEAR_MONTH
n missing distinct value
5694297 0 1 2023-10
Value 2023-10
Frequency 5694297
Proportion 1
--------------------------------------------------------------------------------
DAY_TYPE
n missing distinct
5694297 0 2
Value WEEKDAY WEEKENDS/HOLIDAY
Frequency 3259419 2434878
Proportion 0.572 0.428
--------------------------------------------------------------------------------
TIME_PER_HOUR
n missing distinct Info Mean Gmd .05 .10
5694297 0 23 0.997 14.04 5.933 6 7
.25 .50 .75 .90 .95
10 14 18 21 22
lowest : 0 1 2 4 5, highest: 19 20 21 22 23
--------------------------------------------------------------------------------
PT_TYPE
n missing distinct value
5694297 0 1 BUS
Value BUS
Frequency 5694297
Proportion 1
--------------------------------------------------------------------------------
ORIGIN_PT_CODE
n missing distinct
5694297 0 5073
lowest : 01012 01013 01019 01029 01039, highest: 99139 99161 99171 99181 99189
--------------------------------------------------------------------------------
DESTINATION_PT_CODE
n missing distinct
5694297 0 5077
lowest : 01012 01013 01019 01029 01039, highest: 99139 99161 99171 99181 99189
--------------------------------------------------------------------------------
TOTAL_TRIPS
n missing distinct Info Mean Gmd .05 .10
5694297 0 3495 0.982 20.76 33.13 1 1
.25 .50 .75 .90 .95
2 4 12 37 73
lowest : 1 2 3 4 5, highest: 30985 31349 32355 35931 36668
--------------------------------------------------------------------------------
Explanations for each field in the data can be found in the following metadata.
YEAR_MONTH: Represent year and Month in which the data is collected. Since it is a monthly data frame, only one unique value exist in each data frame.
DAY_TYPE: Represent type of the day which classified as weekdays or weekends/holidays.
TIME_PER_HOUR: Hour which the passenger trip is based on, in intervals from 0 to 23 hours.
PT_TYPE: Type of public transport, Since it is bus data sets, only one unique value exist in each data frame (i.e. bus)
ORIGIN_PT_CODE: ID of origin bus stop
DESTINATION_PT_CODE: ID of destination bus stop
TOTAL_TRIPS: Number of trips
the following panel show the import process for the bus stop geospatial data. The geospatial layer of the data shows point location of bus stops across Singapore.
The geospatial data is imported using st_read function from sf package. As previously done, while reading the data, categorical data that can serve as reference are converted into factors for easing compatibility issue and more efficient processing. The imported dataset is then checked using glimpse function from dplyr package that shows the columns, a glimpse of the values and the data type.
Reading layer `BusStop' from data source `C:\ameernoor\ISSS624\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
Rows: 5,161
Columns: 4
$ BUS_STOP_N <fct> 22069, 32071, 44331, 96081, 11561, 66191, 23389, 54411, 285…
$ BUS_ROOF_N <fct> B06, B23, B01, B05, B05, B03, B02A, B02, B09, B01, B16, B02…
$ LOC_DESC <fct> OPP CEVA LOGISTICS, AFT TRACK 13, BLK 239, GRACE INDEPENDEN…
$ geometry <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…
To see more of the data characteristics, the following panel will show the number of distinct values in each column. it use (n_distinct)[https://dplyr.tidyverse.org/reference/n_distinct.html] function from dplyr package.
Explanations for each field in the data can be found in the following metadata.
To ensure that geospatial data from different sources can be processed together, both have to be projected using similar coordinate systems and be assigned the correct EPSG code based on CRS. The st_crs function is used to check for ESPG Code and Coordinate System of both geospatial files. For Singapore, the coordinate system is SVY21 with EPSG 3414 (source: epsg.io). The following code chunk output shows how the CRS was initially not 3414, then corrected using st_set_crs. Both function are from sf package. Simultaneously, the dataframe is also prepared for joining process. It involve standardization of the data type to factor and using tolower function from base R package.
Current CRS of busstop dataset:
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
Assign new EPSG code
# Assign a new EPSG code (Spatial Reference ID)
busstop <- st_set_crs(
busstop,
3414
) %>%
# Rename the bus stop origin column for easier joining with the main dataframe
mutate(
ORIGIN_PT_CODE = as.factor(BUS_STOP_N)
) %>%
select(
ORIGIN_PT_CODE,
LOC_DESC,
geometry
) %>%
# Change all column names to lowercase for consistency
rename_with(
tolower, everything()
)Confirm the updated EPSG code after the assignment
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
In this step, we inspect the dataset for duplicate entries. This precautionary step is crucial to avoid the inadvertent repetition of records, which could lead to the overcounting of passenger trips. By identifying and addressing duplicates at this stage, we ensure the accuracy and reliability of our analysis, laying the groundwork for more precise and trustworthy results in the subsequent analytical processes. Checking for duplicates is a fundamental practice that contributes to the integrity of the data and the overall robustness of the geospatial analysis.
the following code is used to check duplicates and show how many duplicates exist in each odb.
# Count the number of rows in each dataframe with duplicates
count_duplicate_rows <- function(df, df_name) {
df %>%
group_by_all() %>%
filter(n() > 1) %>%
ungroup() %>%
summarise(df_name = df_name, row_count = n())
}
# Apply the function to each dataframe
duplicate1 <- count_duplicate_rows(odb8, "odb8")
duplicate2 <- count_duplicate_rows(odb9, "odb9")
duplicate3 <- count_duplicate_rows(odb10, "odb10")
# Combine the results
all_duplicates <- bind_rows(duplicate1, duplicate2, duplicate3)
# Print the counts
print(all_duplicates)# A tibble: 3 × 2
df_name row_count
<chr> <int>
1 odb8 0
2 odb9 0
3 odb10 0
The result shows that there are no duplicated data in the origin destination bus dataset. Note that the checking was based on a very loose criteria of duplicate, in which duplicated rows need to have the same value across all columns.
The following codes
| origin_pt_code | loc_desc | geometry | |
|---|---|---|---|
| 149 | 58031 | OPP CANBERRA DR | POINT (27089.69 47570.9) |
| 166 | 62251 | Bef Blk 471B | POINT (35500.54 39943.41) |
| 278 | 47201 | NA | POINT (22616.75 47793.68) |
| 338 | 58031 | OPP CANBERRA DR | POINT (27111.07 47517.77) |
| 501 | 22501 | Blk 662A | POINT (13489.09 35536.4) |
| 751 | 82221 | BLK 3A | POINT (35323.6 33257.05) |
| 1321 | 68091 | AFT BAKER ST | POINT (32164.11 42695.98) |
| 1609 | 43709 | BLK 644 | POINT (18963.42 36762.8) |
| 1937 | 97079 | OPP ST. JOHN’S CRES | POINT (44144.57 38980.25) |
| 2035 | 82221 | Blk 3A | POINT (35308.74 33335.17) |
| 2038 | 97079 | OPP ST. JOHN’S CRES | POINT (44055.75 38908.5) |
| 2092 | 22501 | BLK 662A | POINT (13488.02 35537.88) |
| 2237 | 62251 | BEF BLK 471B | POINT (35500.36 39943.34) |
| 2656 | 68099 | BEF BAKER ST | POINT (32154.9 42742.82) |
| 2773 | 52059 | OPP BLK 65 | POINT (30770.3 34460.06) |
| 2970 | 67421 | CHENG LIM STN EXIT B | POINT (34548.54 42052.15) |
| 3126 | 11009 | Ghim Moh Ter | POINT (23101.34 32594.17) |
| 3156 | 53041 | Upp Thomson Road | POINT (28105.8 37246.76) |
| 3158 | 53041 | Upp Thomson Road | POINT (27956.34 37379.29) |
| 3255 | 77329 | BEF PASIR RIS ST 53 | POINT (40765.35 39452.18) |
| 3261 | 77329 | Pasir Ris Central | POINT (40728.15 39438.15) |
| 3264 | 96319 | Yusen Logistics | POINT (42187.23 34995.78) |
| 3265 | 96319 | YUSEN LOGISTICS | POINT (42187.23 34995.78) |
| 3303 | 52059 | BLK 219 | POINT (30565.45 36133.15) |
| 3411 | 43709 | BLK 644 | POINT (18952.02 36751.83) |
| 3470 | 51071 | MACRITCHIE RESERVOIR | POINT (28311.27 36036.92) |
| 3472 | 51071 | MACRITCHIE RESERVOIR | POINT (28282.54 36033.93) |
| 3665 | 11009 | GHIM MOH TER | POINT (23100.58 32604.36) |
| 3752 | 68091 | AFT BAKER ST | POINT (32038.84 43298.68) |
| 3753 | 68099 | BEF BAKER ST | POINT (32004.05 43320.34) |
| 4624 | 47201 | W’LANDS NTH STN | POINT (22632.92 47934) |
| 5095 | 67421 | CHENG LIM STN EXIT B | POINT (34741.77 42004.21) |
The result shows that there are some duplicated data in the geospatial bus stop dataset. This might interfere with the joining data process and generated double count on later on. Note that the checking was based on the origin_pt_code field only, because it will be the basis of reference for joining the dataset later on. At a glance, the table above also show that, the duplicated code are actually located near each other. Therefore, removing the duplicates can be considered safe as the remaining bus stop code can represent the deleted one. The folowing code chunk will execute the duplicate removal and show the result where number of rows have reduced.
Rows: 5,145
Columns: 3
$ origin_pt_code <fct> 22069, 32071, 44331, 96081, 11561, 66191, 23389, 54411,…
$ loc_desc <fct> OPP CEVA LOGISTICS, AFT TRACK 13, BLK 239, GRACE INDEPE…
$ geometry <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.…
on the interest of analyzing the peak time, the following code will assign new column that categorize peak time, filter data that is not on peak time, and show a sample of the output.
# Function to categorize and aggregate trips
categorize_and_aggregate <- function(odb) {
odb <- odb %>%
# Categorize trips under periods based on day and timeframe
mutate(period = case_when(
DAY_TYPE == "WEEKDAY" & TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9 ~ "Weekday morning peak",
DAY_TYPE == "WEEKDAY" & TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20 ~ "Weekday evening peak",
DAY_TYPE == "WEEKENDS/HOLIDAY" & TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14 ~ "Weekend/holiday morning peak",
DAY_TYPE == "WEEKENDS/HOLIDAY" & TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19 ~ "Weekend/holiday evening peak",
TRUE ~ "non-peak"
)) %>%
# Only retain needed periods for analysis
filter(period != "non-peak") %>%
# Compute the number of trips per origin bus stop per month for each period
group_by(YEAR_MONTH, period, ORIGIN_PT_CODE) %>%
summarise(num_trips = sum(TOTAL_TRIPS)) %>%
# Change all column names to lowercase
rename_with(tolower, everything()) %>%
ungroup()
return(odb)
}
# Apply the function to each dataset
odb8 <- categorize_and_aggregate(odb8)
odb9 <- categorize_and_aggregate(odb9)
odb10 <- categorize_and_aggregate(odb10)
# sample the result
glimpse(odb10)Rows: 20,072
Columns: 4
$ year_month <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-10", …
$ period <chr> "Weekday evening peak", "Weekday evening peak", "Weekda…
$ origin_pt_code <fct> 01012, 01013, 01019, 01029, 01039, 01059, 01109, 01112,…
$ num_trips <dbl> 8000, 7038, 3398, 9089, 12095, 2212, 276, 43513, 25769,…
In order to decide which month is better to perform LISA or whether analyzing all month separately will yield interesting result, it is a good step to check the data distribution of each month. By comparing the data distribution for each month, we can make an educated guess whether the LISA result for each month would be starkly different or just similar.
# Combine odb8, odb9, and odb10 into one dataframe
combined_data <- bind_rows(
mutate(odb8, month = "odb8"),
mutate(odb9, month = "odb9"),
mutate(odb10, month = "odb10")
)
# Create a standard vertical box plot
bus_boxplot <- combined_data %>%
ggplot(aes(x = period, y = num_trips, fill = period)) + # Assigning aesthetics for x and y axes, and fill color based on period
geom_boxplot() + # Adding the box plot layer
facet_wrap(~month, scales = 'free_x') + # Faceting by month, with free scales for x axis
labs(
title = "Distribution of Trips across Peak Periods",
subtitle = "Comparison across different months",
x = "Period",
y = "Number of Trips"
) +
theme_minimal() + # Using a minimal theme for a cleaner look
theme(axis.text.x = element_blank(), axis.title.x = element_blank()) # Removing x-axis category labels and label
bus_boxplot
The box plot to show the data distribution was not too helpful as it shows that all peak time across months has extreme outliers. Therefore, the next code chunk modify the boxplot by log-transforming the number of trips.
# Create a modified vertical box plot
bus_boxplot <- combined_data %>%
ggplot(aes(x = period, y = log(num_trips), fill = period)) + # Modified aesthetics with log-transformed y-axis
geom_boxplot() + # Adding the box plot layer
facet_wrap(~month, scales = 'free_x') + # Faceting by month, with free scales for x axis
labs(
title = "Distribution of Log-Transformed Trips",
subtitle = "Comparison across different months",
y = "Log(Number of Trips)"
) +
theme_minimal() + # Using a minimal theme for a cleaner look
theme(axis.text.x = element_blank(), axis.title.x = element_blank()) # Removing x-axis category labels and label
bus_boxplot
The log-transformed box plot show that the distribution of each peak periods across months is quite similar. In general, number of trips in the weekday peaks are typically higher than weekend/holiday peak. The similarity also extend to extreme outliers. For example, the green box plot (Weekday morning peak) always have a single point extreme outlier on the top of the box plot. Based on this observation, it can be inferred that performing LISA on one of month is most likely enough to discover the pattern. The month October, as the latest data available, is chosen for the next processes.
# Extract data from odb10 and store as a separate dataframe
odb10_wide <- odb10 %>%
# Pivot the data wider, treating each row as a bus stop code with peak period trips as columns
pivot_wider(
names_from = period,
values_from = num_trips
) %>%
select(2:6)
DT::datatable(odb10_wide,
options = list(pageLength = 8),
rownames = FALSE)Rows: 5,072
Columns: 5
$ origin_pt_code <fct> 01012, 01013, 01019, 01029, 01039, 0105…
$ `Weekday evening peak` <dbl> 8000, 7038, 3398, 9089, 12095, 2212, 27…
$ `Weekday morning peak` <dbl> 1770, 841, 1530, 2483, 2919, 1734, 200,…
$ `Weekend/holiday evening peak` <dbl> 3061, 2770, 1685, 4063, 7263, 1118, 101…
$ `Weekend/holiday morning peak` <dbl> 2177, 1818, 1536, 3217, 5408, 1159, 116…
To retain the geospatial property, use left_join by using busstop as the main dataframe and bus stop code as the reference.
Rows: 5,145
Columns: 7
$ origin_pt_code <fct> 22069, 32071, 44331, 96081, 11561, 6619…
$ loc_desc <fct> OPP CEVA LOGISTICS, AFT TRACK 13, BLK 2…
$ `Weekday evening peak` <dbl> 67, 5, 1740, 445, 199, 349, 432, 1285, …
$ `Weekday morning peak` <dbl> 50, NA, 2075, 411, 207, 405, 60, 3059, …
$ `Weekend/holiday evening peak` <dbl> 10, NA, 589, 47, 77, 169, 82, 492, 2016…
$ `Weekend/holiday morning peak` <dbl> 8, NA, 682, 110, 70, 177, 16, 1442, 204…
$ geometry <POINT [m]> POINT (13576.31 32883.65), POINT …
odb10_sf represents a spatial point dataframe, which might not be optimal for spatial autocorrelation analysis where the definition of ‘areas’ requires the representation of spatial entities as polygons instead of individual points. To address this, the subsequent code sections transform these bus stops into a grid of hexagons.
st_make_grid() function to create a hexagon grid.cellsize parameter, measured in the same units as the dataframe’s spatial reference system, determines the width of each hexagon. Given that odb10_sf is projected in ESPG code 3414 with meters as the unit, the hexagon width is set to 500m.hex_id, which serves as the primary key for future reference.Output: Spatial Polygons with Hexagonal Geometry and Hex_id Identification
Rows: 5,580
Columns: 2
$ hex_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ geometry <POLYGON [m]> POLYGON ((3720.122 26626.44..., POLYGON ((3720.122 27…
This code utilizes the st_make_grid function to create a hexagon grid based on the specified parameters. The resulting hexagon grid is then converted into a spatial dataframe (st_sf()) to maintain its geospatial properties. The rowid_to_column function is employed to assign a unique identifier (hex_id) to each hexagon, facilitating subsequent analyses or referencing.
Given that each hexagonal area may encompass multiple bus stops, the hex_id serves as the primary key for aggregation. The subsequent procedures outline how to organize attributes based on hex_id:
st_join() function with join = st_within to associate bus stop points with hexagon areas.st_set_geometry(NULL) argument eliminates the geospatial layer, focusing on attribute aggregation.group_by() to obtain a unique hex_id for each row.summarise() to calculate the aggregate count of bus stops and trips per peak period within each hexagon area.NA values with 0 using replace(is.na(.), 0).Output: Attribute Dataframe, with Hex_id as the Primary Key
odb10_stops <- st_join(
odb10_sf,
odb10_hex,
join = st_within
) %>%
st_set_geometry(NULL) %>%
group_by(hex_id) %>%
summarise(
n_busstops = n(),
busstops_code = str_c(origin_pt_code, collapse = ","),
loc_desc = str_c(loc_desc, collapse = ","),
`Weekday morning peak` = sum(`Weekday morning peak`, na.rm = TRUE),
`Weekday evening peak` = sum(`Weekday evening peak`, na.rm = TRUE),
`Weekend/holiday morning peak` = sum(`Weekend/holiday morning peak`, na.rm = TRUE),
`Weekend/holiday evening peak` = sum(`Weekend/holiday evening peak`, na.rm = TRUE)
) %>%
ungroup() %>%
mutate(across(where(is.numeric), ~ replace_na(., 0)),
across(where(is.character), ~ replace_na(., "")))
# check the output
glimpse(odb10_stops)Rows: 1,524
Columns: 8
$ hex_id <int> 34, 65, 99, 127, 129, 130, 131, 159, 16…
$ n_busstops <int> 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, …
$ busstops_code <chr> "25059", "25751", "26379", "25761", "25…
$ loc_desc <chr> "AFT TUAS STH BLVD", "BEF TUAS STH AVE …
$ `Weekday morning peak` <dbl> 103, 52, 78, 185, 1116, 53, 60, 64, 251…
$ `Weekday evening peak` <dbl> 390, 114, 291, 1905, 2899, 241, 368, 29…
$ `Weekend/holiday morning peak` <dbl> 0, 26, 52, 187, 455, 76, 45, 21, 39, 69…
$ `Weekend/holiday evening peak` <dbl> 56, 14, 100, 346, 634, 55, 49, 53, 132,…
left_join to combine the new odb10_stops attribute dataframe with the existing odb10_hex hexagon geospatial layer, linking them by hex_id to integrate attributes into the spatial polygon dataframe.filter to exclude hexagons without bus stops.Output: Spatial Polygon Dataframe
Rows: 1,524
Columns: 9
$ hex_id <int> 34, 65, 99, 127, 129, 130, 131, 159, 16…
$ n_busstops <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, …
$ busstops_code <chr> "25059", "25751", "26379", "25761", "25…
$ loc_desc <chr> "AFT TUAS STH BLVD", "BEF TUAS STH AVE …
$ `Weekday morning peak` <dbl> 103, 52, 78, 185, 1116, 53, 60, 64, 251…
$ `Weekday evening peak` <dbl> 390, 114, 291, 1905, 2899, 241, 368, 29…
$ `Weekend/holiday morning peak` <dbl> 0, 26, 52, 187, 455, 76, 45, 21, 39, 69…
$ `Weekend/holiday evening peak` <dbl> 56, 14, 100, 346, 634, 55, 49, 53, 132,…
$ geometry <POLYGON [m]> POLYGON ((3970.122 27925.48...,…
In Step 2, the code outlines a process for associating bus stop attributes with hexagonal areas using st_join. This involves grouping and summarizing the attributes based on the unique hex_id. The resulting dataframe, odb10_stops, serves as an attribute-centric representation with the hex_id as the primary key.
Moving to Step 3, the code then combines this attribute dataframe with the original hexagon geospatial layer, odb10_hex, utilizing left_join. The resulting spatial polygon dataframe, bustraffic10, integrates both geometric and attribute information. The filter operation ensures that only hexagons containing bus stops are retained in the final dataframe.
Transform Data Here if exploration show sumtin glimpse (wankee) every dataset or think of a better method bus stop, population, passenger volume, subzone. Isn’t this also better be put on data checking?
# Plot the hexagon grid based on n_busstops
tmap_mode("view")
busstop_map = tm_shape(odb10_final)+
tm_fill(
col = "n_busstops",
palette = "Blues",
style = "cont",
title = "Number of Bus Stops",
id = "loc_desc",
showNA = FALSE,
alpha = 0.6,
popup.format = list(
grid_id = list(format = "f", digits = 0)
)
)+
tm_borders(col = "grey40", lwd = 0.7)
busstop_mapFrom the map, we can infer that the central region, likely encompassing the Central Business District (CBD) and surrounding residential areas, which are known to be highly populated and active, has a higher number of bus stops. This correlates with the need for robust public transportation in densely populated areas to support the daily commute of residents and workers. Lighter shades are observed towards the periphery of the island, suggesting fewer bus stops, which could correspond to less urbanized or industrial areas, like the Western and North-Eastern regions where the population density is lower. The map reflects Singapore’s urban planning and transportation strategy, which aims to provide accessibility and connectivity, particularly in high-density regions where demand for public transit is greatest.
find the ideal breaks for plotting the trips using summary
hex_id n_busstops busstops_code loc_desc
Min. : 34 Min. : 1.000 Length:1524 Length:1524
1st Qu.:1941 1st Qu.: 2.000 Class :character Class :character
Median :2950 Median : 3.000 Mode :character Mode :character
Mean :2813 Mean : 3.376
3rd Qu.:3734 3rd Qu.: 4.000
Max. :5558 Max. :11.000
Weekday morning peak Weekday evening peak Weekend/holiday morning peak
Min. : 0 Min. : 0 Min. : 0.0
1st Qu.: 909 1st Qu.: 2139 1st Qu.: 384.8
Median : 7532 Median : 7246 Median : 2159.5
Mean : 16838 Mean : 16135 Mean : 4986.7
3rd Qu.: 23245 3rd Qu.: 16947 3rd Qu.: 6371.0
Max. :386450 Max. :542529 Max. :109329.0
Weekend/holiday evening peak geometry
Min. : 0.0 POLYGON :1524
1st Qu.: 529.5 epsg:3414 : 0
Median : 2157.5 +proj=tmer...: 0
Mean : 4999.7
3rd Qu.: 5458.2
Max. :150399.0
The summary result confirmed that the trips data is right-skewed and contains extreme outliers. This knowledge is then used to customize the break in the comparison map.
the following code plot the comparison map
# Define the columns for which we want to find the global min and max
value_columns <- c("Weekday morning peak", "Weekday evening peak", "Weekend/holiday morning peak", "Weekend/holiday evening peak")
# Set tmap to interactive viewing mode
tmap_mode("view")
# Define a function to create each map with a customized legend
create_map <- function(value_column) {
tm_shape(odb10_final) +
tm_fill(
col = value_column,
palette = "-RdBu",
style = "fixed",
title = value_column,
id = "loc_desc",
showNA = FALSE,
alpha = 0.6,
breaks = c(0, 500, 1000, 2000, 3000, 5000, 10000, 50000, 100000, 300000, 550000)
) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(
legend.position = c("right", "bottom"), # Adjust legend position
legend.frame = TRUE,
legend.width = 0.15, # Adjust the width of the legend
legend.height = 0.5 # Adjust the height of the legend
)
}
# Apply the function to each value column and store the maps
map_list <- lapply(value_columns, create_map)
# Combine the maps into a 2x2 grid
combined_map <- tmap_arrange(map_list[[1]], map_list[[2]], map_list[[3]], map_list[[4]], ncol = 1, nrow = 4)
# Show the combined map
combined_mapUsing the same classification scaling for all maps, it clearly shows that weekdays trips volume is higher in general than weekend/holiday. Nevertheless, at a glance, the crowded area are more or less the same across all days and time. This confirmed the previous finding on bus stops in which the area with most traffics are likely to encompassing the Central Business District (CBD) and surrounding residential areas, which are known to be highly populated and active the area of Singapore. At the same time, it also reflects Singapore’s urban planning and transportation strategy, in which the busiest area with potential high traffics are supported by more bus stops.
The next distribution will see which bus stop is the busiest on average, in terms of number of trips per bus stop. To do that, firstly the following code will generate new columns of trip per bus stop for each hexagon.
hex_id n_busstops busstops_code loc_desc
Min. : 34 Min. : 1.000 Length:1524 Length:1524
1st Qu.:1941 1st Qu.: 2.000 Class :character Class :character
Median :2950 Median : 3.000 Mode :character Mode :character
Mean :2813 Mean : 3.376
3rd Qu.:3734 3rd Qu.: 4.000
Max. :5558 Max. :11.000
Weekday morning peak Weekday evening peak Weekend/holiday morning peak
Min. : 0 Min. : 0 Min. : 0.0
1st Qu.: 909 1st Qu.: 2139 1st Qu.: 384.8
Median : 7532 Median : 7246 Median : 2159.5
Mean : 16838 Mean : 16135 Mean : 4986.7
3rd Qu.: 23245 3rd Qu.: 16947 3rd Qu.: 6371.0
Max. :386450 Max. :542529 Max. :109329.0
Weekend/holiday evening peak geometry avg_Weekday morning peak
Min. : 0.0 POLYGON :1524 Min. : 0
1st Qu.: 529.5 epsg:3414 : 0 1st Qu.: 391
Median : 2157.5 +proj=tmer...: 0 Median : 2598
Mean : 4999.7 Mean : 4513
3rd Qu.: 5458.2 3rd Qu.: 6119
Max. :150399.0 Max. :119816
avg_Weekday evening peak avg_Weekend/holiday morning peak
Min. : 0.0 Min. : 0.0
1st Qu.: 950.8 1st Qu.: 162.8
Median : 2284.0 Median : 775.0
Mean : 4420.4 Mean : 1349.9
3rd Qu.: 4575.0 3rd Qu.: 1659.1
Max. :136001.0 Max. :43420.0
avg_Weekend/holiday evening peak
Min. : 0.0
1st Qu.: 224.1
Median : 711.1
Mean : 1380.3
3rd Qu.: 1468.9
Max. :39425.0
the following code plot the comparison map
# Define the columns for which we want to find the global min and max
value_columns <- c("avg_Weekday morning peak", "avg_Weekday evening peak", "avg_Weekend/holiday morning peak", "avg_Weekend/holiday evening peak")
# Set tmap to interactive viewing mode
tmap_mode("view")
# Define a function to create each map with a customized legend
create_map <- function(value_column) {
tm_shape(odb10_final_avg) +
tm_fill(
col = value_column,
palette = "-RdBu",
style = "fixed",
title = value_column,
id = "loc_desc",
showNA = FALSE,
alpha = 0.6,
breaks = c(0, 100, 200, 300, 400, 500, 750, 1000, 1500, 5000, 10000, 50000, 140000)
) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(
legend.position = c("right", "bottom"), # Adjust legend position
legend.frame = TRUE,
legend.width = 0.15, # Adjust the width of the legend
legend.height = 0.5 # Adjust the height of the legend
)
}
# Apply the function to each value column and store the maps
map_list <- lapply(value_columns, create_map)
# Combine the maps into a 2x2 grid
combined_map <- tmap_arrange(map_list[[1]], map_list[[2]], map_list[[3]], map_list[[4]], ncol = 1, nrow = 4)
# Show the combined map
combined_mapat a glance, using the average trips per bus stop shows slightly different insight. Comparatively to the total trips, average number of trips shows that the area around Jurong, Woodlands, and bus stops in Johor (part of Malaysia) is actually busier than what total trips suggest. In the context of transport policy, this might be the first lead to expand the number of bus stops in the particular area to cater for more commuters.
In this section, we will employ Local Indicators of Spatial Association (LISA) statistics to explore the presence of clusters or outliers in the total number of trips generated at the hexagon layer. Specifically, we will utilize Moran’s I statistic to detect spatial patterns such as clustering or dispersion of total trips.
Before delving into global/local spatial autocorrelation statistics, we need to create a spatial weights matrix for our study area. This matrix defines the neighbors of hexagonal spatial units based on their distances. Here are some considerations:
Each feature should have at least one neighbor, and no feature should be a neighbor with all other features.
For skewed data, each feature should ideally have at least 8 neighbors (12 is used here).
Given the sparse distribution of bus stops in certain regions (e.g., central catchment areas, military training areas, and airports), distance-based methods are favored over contiguity methods.
Adaptive Distance-Based (Fixed Number of Neighbors) method is chosen over Fixed-Distance Threshold:. This method is chosen due to the right-skewed nature of our data. In this method, Each hexagon is guaranteed at least nneighbors, facilitating later statistical significance testing.
We will set each hexagon to have 12 neighbors using the provided R code. The process involves using st_knn to obtain a list of neighbors, and then st_weights to generate row-standardized spatial weights.
| nb | wt | hex_id | n_busstops | busstops_code | loc_desc | Weekday morning peak | Weekday evening peak | Weekend/holiday morning peak | Weekend/holiday evening peak | geometry |
|---|---|---|---|---|---|---|---|---|---|---|
| 2, 4, 5, 8, 9, 12, 13, 16, 22, 23, 38, 39 | 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333 | 34 | 1 | 25059 | AFT TUAS STH BLVD | 103 | 390 | 0 | 56 | POLYGON ((3970.122 27925.48… |
| 1, 3, 4, 5, 8, 9, 12, 13, 16, 22, 23, 38 | 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333 | 65 | 1 | 25751 | BEF TUAS STH AVE 14 | 52 | 114 | 26 | 14 | POLYGON ((4220.122 28358.49… |
| 5, 6, 7, 9, 10, 13, 14, 16, 17, 18, 24, 25 | 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333 | 99 | 1 | 26379 | YONG NAM | 78 | 291 | 52 | 100 | POLYGON ((4470.122 30523.55… |
| 1, 2, 5, 8, 9, 12, 13, 16, 22, 23, 38, 39 | 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333 | 127 | 1 | 25761 | OPP REC S’PORE | 185 | 1905 | 187 | 346 | POLYGON ((4720.122 28358.49… |
| 3, 6, 9, 10, 12, 13, 14, 16, 17, 24, 30, 31 | 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333 | 129 | 2 | 25719,26389 | THE INDEX,BEF TUAS STH AVE 5 | 1116 | 2899 | 455 | 634 | POLYGON ((4720.122 30090.54… |
| 3, 5, 7, 9, 10, 11, 13, 14, 17, 18, 24, 25 | 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333 | 130 | 1 | 26369 | SEE HUP SENG | 53 | 241 | 76 | 55 | POLYGON ((4720.122 30956.57… |
Global spatial association evaluates the overall spatial pattern of a variable, in this case, the total number of trips across the entire study area. It provides a single metric summarizing the extent to which similar values cluster together or disperse across the geographic space.
Compute Global Moran’s I for each peak time trips generated at the hexagon level, utilizing simulated data to avoid assumptions of normality and randomness. The number of simulations is set to 99+1 = 100.
# Set the seed to ensure reproducibility
set.seed(1234)
# define the value_columns
value_columns <- c("Weekday morning peak", "Weekday evening peak", "Weekend/holiday morning peak", "Weekend/holiday evening peak")
# Create a function to perform global Moran's I test
perform_global_moran <- function(data, value_column, k) {
# Compute spatial weights
nb <- st_knn(data$geometry, k = k)
wt <- st_weights(nb, style = 'W')
# Perform global Moran's I test
moran_result <- global_moran_perm(data[[value_column]], nb, wt, nsim = 99)
# Include the value_column in the result
result <- list(
value_column = value_column,
moran_result = moran_result
)
return(result)
}
# Apply the function for each time interval
results <- lapply(value_columns, function(vc) perform_global_moran(wm_all, vc, k = 12))
# Print the results
print(results)[[1]]
[[1]]$value_column
[1] "Weekday morning peak"
[[1]]$moran_result
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.18614, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
[[2]]
[[2]]$value_column
[1] "Weekday evening peak"
[[2]]$moran_result
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.044306, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
[[3]]
[[3]]$value_column
[1] "Weekend/holiday morning peak"
[[3]]$moran_result
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.13804, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
[[4]]
[[4]]$value_column
[1] "Weekend/holiday evening peak"
[[4]]$moran_result
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.084188, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
For all four time intervals, since the p-value for global Moran’s I is smaller than 0.05, we can reject the null hypothesis that the spatial patterns are random. Moreover, as the Moran’s I statistics are greater than 0, we can infer that the spatial distribution exhibits signs of clustering for all four time intervals, consistent with the choropleth maps plotted earlier.
Local spatial association provides a more detailed examination of spatial patterns at the local level, identifying specific areas with strong or weak spatial association. Local Moran’s I categorizes regions as high-high, low-low, high-low, or low-high, indicating clustering or outlier behavior.
Compute Local Indicators of Spatial Association (LISA) for passenger trips generated by origin at the hex level. The function local_moran from the sfdep package is utilized, automatically computing neighbor lists and weights using simulated data. The results are then unnested for further analysis and displayed in an interactive datatable format.
# Create a function to perform local Moran's I analysis
get_lisa <- function(wm, value_column, k) {
# Compute spatial weights
nb <- st_knn(wm$geometry, k = k)
wt <- st_weights(nb, style = 'W')
# Perform local Moran's I analysis and create a new data frame
result <- wm %>%
mutate(
local_moran = local_moran(.data[[value_column]], nb, wt, nsim = 99),
.before = 1
) %>%
unnest(local_moran)
return(result)
}
# Initialize a list to store results for each value_column
lisa_results <- list()
# Apply the function for each time interval and store results in the list
for (vc in value_columns) {
lisa_results[[vc]] <- get_lisa(wm_all, vc, k = 12)
# Remove columns that don't belong to the specific time interval
unwanted_columns <- setdiff(value_columns, vc)
lisa_results[[vc]] <- lisa_results[[vc]][, !(names(lisa_results[[vc]]) %in% unwanted_columns)]
}
# show sample output in an interactive table
datatable(lisa_results[["Weekday morning peak"]],
class = 'cell-border stripe',
options = list(pageLength = 5))Utilize tmap core functions to construct choropleth maps displaying the Local Moran’s I field and p-value field for all four time intervals. Only significant values of Local Moran’s I at the 95% confidence level are plotted.
get_sig_lmi_map <- function(lisa_table, title) {
sig_lisa_table <- lisa_table %>%
filter(p_ii_sim < 0.05)
result <- tm_shape(lisa_table) +
tm_polygons() +
tm_borders(alpha = 0.5) +
tm_shape(sig_lisa_table) +
tm_fill("ii") +
tm_borders(alpha = 0.4) +
tm_layout(
main.title = title,
main.title.size = 1.3
)
return(result)
}
sig_lmi_1 <- get_sig_lmi_map(lisa_results[["Weekday morning peak"]], "Local Moran's I of Total Trips generated on Weekday Morning at 95% CL" )
sig_lmi_2 <- get_sig_lmi_map(lisa_results[["Weekday evening peak"]], "Local Moran's I of Total Trips generated on Weekday Afternoon at 95% CL" )
sig_lmi_3 <- get_sig_lmi_map(lisa_results[["Weekend/holiday morning peak"]], "Local Moran's I of Total Trips generated on Weekend Morning at 95% CL" )
sig_lmi_4 <- get_sig_lmi_map(lisa_results[["Weekend/holiday morning peak"]], "Local Moran's I of Total Trips generated on Weekend Afternoon at 95% CL" )
tmap_mode('plot')
tmap_arrange(
sig_lmi_1,
sig_lmi_2,
sig_lmi_3,
sig_lmi_4,
asp = 2,
nrow = 2,
ncol = 2
)
The choropleth maps displaying Local Moran’s I reveal that darker orange and darker green areas signify outlier regions. To classify it into low-high or high-low cluster, further analysis is required using LISA. Similarly, the light green are indicates either high-high or low-low regions. Notably, the the area around Tuas and Jurongis most likelya low-low area based on the previous geovisualization.
get_sig_lisa_map <- function(lisatable, title) {
sig_lisatable <- lisatable %>%
filter(p_ii_sim < 0.05)
result <- tm_shape(lisatable) +
tm_polygons(alpha = 0.5) +
tm_borders(alpha = 0.5) +
tm_shape(sig_lisatable) +
tm_fill("median",
id = "loc_desc",
palette = c("blue", "lightcoral", "lightblue", "red"),
alpha= 0.7) +
tm_borders(alpha = 0.4) +
tm_layout(main.title = title,
main.title.size = 1.5,
legend.position = c("left", "top"))
return(result)
}
sig_lisa_1 <- get_sig_lisa_map(lisa_results[["Weekday morning peak"]],"LISA categories generated on Weekday Morning at 95% CL" )
sig_lisa_2 <- get_sig_lisa_map(lisa_results[["Weekday evening peak"]], "LISA categories generated on Weekday Afternoon at 95% CL" )
sig_lisa_3 <- get_sig_lisa_map(lisa_results[["Weekend/holiday morning peak"]], "LISA categories generated on Weekend Morning at 95% CL" )
sig_lisa_4 <- get_sig_lisa_map(lisa_results[["Weekend/holiday morning peak"]], "LISA categories generated on Weekend Afternoon at 95% CL" )